Greek Named Entity Recognition using Support Vector Machines, Maximum Entropy and Onetime
نویسندگان
چکیده
We describe our work on Greek Named Entity Recognition using comparatively three different machine learning techniques: (i) Support Vector Machines (SVM), (ii) Maximum Entropy and (iii) Onetime, a shortcut method based on previous work of one of the authors. The majority of our system’s features use linguistic knowledge provided by: morphology, punctuation, position of the lexical units within a sentence and within a text, electronic dictionaries, and the outputs of external tools (a tokenizer, a sentence splitter, and a Hellenic version of Brill’s Part of Speech Tagger). After testing we observed that the application of a few simple Post Testing Classification Correction (PTCC) rules created after the observation of output errors, improved the results of the SVM and the Maximum Entropy systems output. We achieved very good results with the three methods. Our best configurations (Support Vector Machines with a second degree polynomial kernel and Maximum Entropy) achieved both after the application of PTCC rules an overall F-measure of 91.06.
منابع مشابه
Named Entity Recognition using Maximum Entropy Models on Biologists’ Literature
According to the explosion of online biomedical texts, it becomes more difficult to get exact information manually. The named entity recognition is the very first step for further text mining tasks like information extraction, knowledge discovery and others. In this paper, we present our statistical named entity recognition method. Until now, there were some approaches using different statistic...
متن کاملTuning support vector machines for biomedical named entity recognition
We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus – the GENIA corpus – tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Expe...
متن کاملA Greek Named-Entity Recognizer That Uses Support Vector Machines and Active Learning
Wepresent a named-entity recognizer for Greek person names and temporal expressions. For temporal expressions, it relies on semiautomatically produced patterns. For person names, it employs two Support Vector Machines, that scan the input text in two passes, and active learning, which reduces the human annotation effort during training.
متن کاملNamed Entity Recognition in Greek Texts with an Ensemble of SVMs and Active Learning
We present a freely available named-entity recognizer for Greek texts that identifies temporal expressions, person, and organization names. For temporal expressions, it relies on semi-automatically produced patterns. For person and organization names, it employs an ensemble of Support Vector Machines that scan the input text in two passes. The ensemble is trained using active learning, whereby ...
متن کاملNamed Entity Recognition for Indian Languages: A Survey
Named Entity Recognition (NER) is a sub task of Information Extraction (IE) used to identify and classify the names in any given data. Earlier studies were mostly based on hand written rules where as now-a-days Machine Learning models such as Hidden Markov Model (HMM), Maximum Entropy (MaxEnt), Maximum Entropy Markov model (MEMM), Support Vector Machine (SVM), Conditional Random Fields (CRFs) a...
متن کامل